Allow heartbeat to restart the pipe thread with only sync commands#2965
Open
frobion wants to merge 2 commits intoStackExchange:mainfrom
Open
Conversation
There is a thread looping in the method PhysicalConnection.ReadFromPipe to process response from Redis, match them with the sent command and signaling the completion of the message. If this thread has an exception, its catch block will call RecordConnectionFailed which will proceed to restart a new thread to continue reading Redis responses. However, if another exception occurred in the catch before the new thread can be started (in a case of high memory pressure, OOM exceptions can happen anywhere) we are in a state where no one is reading the pipe of Redis responses, and all commands sent end in timeout. If at least one async command is sent, the heartbeat thread will detect the timeout in the OnBridgeHeartbeat method, and if no read were perform for 4 heartbeat it will issue a connection failure. With this commit, this becomes true for sync commands as well. Therefore, it ensures we will not reach a state were all commands end in timeout.
NickCraver
reviewed
Oct 14, 2025
| // This is an "always" check - we always want to evaluate a dead connection from a non-responsive sever regardless of the need to heartbeat above | ||
| if (timedOutThisHeartbeat > 0 | ||
| var totalTimeoutThisHeartbeat = asyncTimeoutThisHeartbeat + syncTimeoutThisHeartbeat; | ||
| if ((totalTimeoutThisHeartbeat > 0) |
Collaborator
There was a problem hiding this comment.
Note: this is comparing sync timeouts but using async thresholds in the line below, so we might be introducing a case in which we've violating async timeout * 4, but only had 1 sync timeout that's not severe, creating more disconnects.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There is a thread looping in the method
PhysicalConnection.ReadFromPipeto process response from Redis, match them with the sent command and signaling the completion of the message. If this thread has an exception, its catch block will callRecordConnectionFailedwhich will proceed to restart a new thread to continue reading Redis responses.However, if another exception occurred in the catch before the new thread can be started (in a case of high memory pressure, OOM exceptions can happen anywhere) we are in a state where no one is reading the pipe of Redis responses, and all commands sent end in timeout.
If at least one async command is sent, the heartbeat thread will detect the timeout in the
OnBridgeHeartbeatmethod, and if no read were perform for 4 heartbeat it will issue a connection failure. However, no such protection were in place if only sync commands are sent. In this case, they were all ending in timeout without any mechanisms to start reading their responses again.In this PR, the heartbeat thread will check timeouts for sync commands as well. Therefore, it will be able to start the thread looping in
ReadFromPipeeven if only sync commands are sent, ensuring we will not reach a state were all commands end in timeout.This PR is loosely linked to the issue #2919, as this problem and its correction were found during investigation of the issue.
Moreover, it can be related to issue #2888 as the symptoms are similar to what I observed. However, I don't know if in this case only synchronous commands were sent or not.